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PROCESSING OF MULTI-CHANNEL SIGNALS 

BACKGROUND OF THE INVENTION 
Field Of The Invention 

[0001] The present invention relates to the processing of audio 
signals and, more particularly, the coding of multi-channel audio 
5 signals. 

Description O f The Related Art 

[0002] Parametric multi-channel audio coders generally transmit 
only one full -bandwidth audio channel combined with a set of 

10 parameters that describe the spatial properties of an input signal. 
For example, Fig. 1 shows the steps performed in an encoder 10 
described in ■B»y-epea-n-"Pa:t-enteIntemational Application No. 

■Q-a07-9-84.-7-;-9 W02 003/90208, filed November 20, 2-Q-G^ April 22, 2 003 

(Attorney Docket No. PHNL021156). 

15 [0003] In an initial step SI, input signals L and R are split 
into subbands 101, for example^ by time-windowing followed by a 
transform operation. Subsequently, in step S2 , the level difference 
| (ILD) of corresponding subband signals is determined; in step S3^_ 
the time difference (ITD or IPD) of corresponding subband signals 

20 | is determined; and in step S4^ the amount of similarity or 

dissimilarity of the waveforms which cannot be accounted for by 
ILDs or ITDs, is described. In the subsequent steps S5, S6, and S7 , 
the determined parameters are quantized. 
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[0004] In step S8, a monaural signal S is generated from the 
incoming audio signals^ and finally, in step S9, a coded signal 102 
is generated from the monaural signal and the determined spatial 
parameters . 

5 | [0005] Fig. 2 shows a schematic block diagram of a coding system 
comprising the encoder 10 and a corresponding decoder 202. The 
coded signal 102^ comprising the sum signal S and spatial 
parameters is communicated to a decoder 202. The signal 102 may 
be communicated via any suitable communications channel 204. 

10 | Alternatively... or additionally, the signal may be stored on a 

removable storage medium 214, which may be transferred from the 
encoder to the decoder. 
| .. [00.0.61 Synthesis (in the decoder 202) is performed by applying 
the spatial parameters to the sum signal to generate left and right 

15 output signals. Hence, the decoder 202 comprises a decoding module 
210 which performs the inverse operation of step S9 and extracts 
the sum signal S and the parameters P from the coded signal 102. 
The decoder further comprises a synthesis module 211 which recovers 
the stereo components L and R from the sum (or dominant) signal and 

2 0 the spatial parameters. 

| [0007] One of the challenges is to generate the monaural signal 
S, step S8, in such a way that, on decoding into the output 
channels, the perceived sound timbre is exactly the same as for the 
input channels. 



PHNL030241-SS-RED-101607.doc 1 



PHNL 030241 



[0008] Several methods of generating this sum signal have been 
suggested previously. In general^ these .methods compose a mono 
signal as a linear combination of the input signals. Particular 
techniques include: 

5 

| 1. Simple summation of the input signals. See^_ for example^ 

'Efficient representation of spatial audio using perceptual 
parametrization' , by C. Faller and F. Baumgarte, WASPAA'01, 
Workshop on applications of signal processing on audio and 
10 acoustics, New Paltz, New York, 2001. 

2 . Weighted summation of the input signals using principle 
component analysis (PCA) . See^ for example^. Ett^e > pean - " - Internatipnal. 
Patent Application No. 03076406 . 0 W O2003 / 85645 , filed April 10, 

15 xttOPj-ki rch 20 , 2003 (Attorney Docket No. PHNL020284) and Euro p e an 
International Patent Application No. Q-2-Q-7-64--l-Q-- ; -feW02 003/85643 filed 

Ap^i-!---4:0--; £0-Q-2-March 20, 2 003 (Attorney Docket No. PHNL020283) . In 

this scheme, the squared weights of the summation sum up to one and 
the actual values depend on the relative energies in the input 

20 signals. 

3 . Weighted summation with weights depending on the time- 
domain correlation between the input signals. See for example 
'Joint stereo coding of audio signals', by D. Sinha, European 

25 | patont Patent application Application. No . EP 1 107 232 A2 . In this 
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method, the weights sum to +1, while the actual values depend on 
the cross-correlation of the input channels. 

| 4. U.S. Patent 5,701,346- to Herre et alj. discloses weighted 

5 summation with energy-preservation scaling for downmixing left, 
right, and center channels of wideband signals. However, this is 
| not performed as a function of frequency. 

| 100091. These methods can be applied to the full -bandwidth signal 

10 or can be applied on band-filtered signals which all have their own 
| weights for each frequency band. However, all....g.f ....the methods 
described have one drawback. If the cross-correlation is frequency- 
dependent, which is very often the case for stereo recordings, 
coloration (i.e., a change of the perceived timbre) of the sound of 

15 the decoder occurs. 

| [0010] This can be explained as follows: For a frequency band 
that has a cross-correlation of +1, linear summation of two input 
signals results in a linear addition of the signal amplitudes and 
squaring the additive signal to determine the resultant energy. 

20 (For two in-phase signals of equal amplitude, this results in a 

doubling of amplitude with a quadrupling of energy.) If the cross- 
correlation is 0, linear summation results in less than a doubling 
of the amplitude and a quadrupling of the energy. Furthermore, if 
the cross-correlation for a certain frequency band amounts -1, the 

2 5 signal components of that frequency band cancel out and no signal 
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remains. Hence^ for simple summation, the frequency bands of the 
sum signal can have an energy (power) between 0 and four times the 
power of the two input signals, depending on the relative levels 
and the cross-correlation of the input signals. 

SUMMARY OF THW INVENTION 
[0011] The present invention attempts to mitigate this problem 
and provides a method according to claim 1 of generating a monaural 
signal (S) compri sing a combination of at least tw o input audio 
channels LL, B.L, CQmprisina...the...stej2S....p„f .:. 

[0012 j 1 ; ? a_< ' ' i i (M.11...S2 

o : j ; c : j :j <■'■■;; r f ; ; • r u • I (L, R 5 summing. (.4.6.) cgrr e.sj^ondj : ng...Jrep^^ency 

components from respective frequency sr>ectrura representations for 
each audio channel (L(k) f R(k)> to provide a set, of summed 
-• tS(k)) for each sequential segment; 

[0013] for each of said plurality of sequential s e gments, 

calculating (45) a correction factor (m(i)? for each of a -plurality 

.mis (t) as function of the energy ol 
components of the summed signal in said band ( ^\ S(k) \ 2 ) and the 
j- v: / jo ■-.oc: . ^v^ ; -Q^tr ; : of the inpOT --;-;- v; ---. channels in 

said band ( Y { L(k) | 2 + 1 R(k) | 2 } ) ; and 

kei 

[0014] r - a r ,vl frequency component as a 

function of the correction factor (m(i)) for the frequency band of 
said component , . 
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[0015] If different frequency bands tended to^, on average^ have 
the same correlation, then one might expect that over time^ 
distortion caused by such summation would average out over the 
| frequency spectrum. However, it has been ^e - eeq - n - ieed- recoqnized 
5 that, in multi-channel signals, low frequency components tend to be 
more correlated than high frequency components. Therefore, it will 
be seen that without the present invention, summation, which does 
not take into account frequency dependent correlation of channels, 
would tend to unduly boost the energy levels of more highly 
10 correlated and, in particular, psycho-acoustically sensitive low 
frequency bands . 

| 19..9.13.1. The present invention provides a frequency-dependent 
correction of the mono signal where the correction factor depends 
on a frequency-dependent cross-correlation and relative levels of 

15 the input signals. This method reduces spectral coloration 

artefacts which are introduced by known summation methods and 
ensures energy preservation in each frequency band. 
| [0017] The frequency-dependent correction can be applied by 
first summing the input signals (either summed linear or weighted) 

20 followed by applying a correction filter, or by releasing the 

constraint that the weights for summation (or their squared values) 
necessarily sum up to +1 but sum to a value that depends on the 
cross -correlation. 
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[0018] It should be noted that a- : l--L--h-a-ugh----the invention can be 
applied to any system where two or more two input channels are 
combined. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

[0019] Embodiments of the invention will now be described with 
reference to the accompanying drawings, in which: 

10 .100201. Figure 1 shows a prior art encoder; 

1Q.93A1. Figure 2 shows a block diagram of an audio system 
including the encoder of Figure 1; 
| .. [00.221 Figure 3 shows the steps performed by a signal summation 
component of an audio coder according to a first embodiment of the 
15 invention; and 

| [0023] Figure 4 shows linear interpolation of the correction 
factors m(i) applied by the summation component of Figure 3. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
20 [0024] According to the present invention, there is provided an 
improved signal summation component (S8'), in particular^ for 
performing the step corresponding to S8 of Figure 1. Nonetheless, 
it will be seen that the invention is applicable anywhere two or 
more signals need to be summed. In a first embodiment of the 
25 invention, the summation component adds left and right stereo 
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channel signals prior to the summed signal S being encoded, step 
S9. 

| [0025] Referring now to Figure 3, in the first embodiment, the 
left (L) and right (R) channel signals provided to the summation 
5 component comprise multi-channel segments ml, m2... overlapping in 
successive time frames t(n-l), t (n) , t (n+1). Typically sinusoids, 
are updated at a rate of 10ms and each segment ml, m2... is twice the 
length of the update rate, i.e.^. 20ms. 

IQ3.23.1 For each overlapping time window t (n-1 ) , t (n) , t (n+1 ) for 
10 which the L,R channel signals are to be summed, the summation 

component uses a (square-root) Hanning window function to combine 
each channel signal from overlapping segments ml, m2... into a 
respective time-domain signal representing each channel for a time 
window, step 42. 

15 | [0027] An FFT (Fast Fourier Transform) is applied on each time- 
domain windowed signal, resulting in a respective complex frequency 
spectrum representation of the windowed signal for each channel, 
step 44. For a sampling rate of 44.1kHz and a frame length of 20ms, 
the length of the FFT is typically 882. This process results in a 

20 set of K frequency components for both input channels (L(k), R(k)). 
| [0028] In the first embodiment, the two input channels 
representations L(k) and R(k) are first combined by a simple linear 
summation, step 46. It will be seen, however, that this could 
| easily be extended to a weighted summation. Thus, for the present 

25 embodiment, sum signal S(k) comprises: 
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S(k) = L(k) + R(k) 

Separately, the frequency components of the input signals L(k) and 
R(k) are grouped into several frequency bands, preferably using 
perceptually-related bandwidths (ERB or BARK scale) and, for each 
subband i, an energy-preserving correction factor m(i) is computed, 
step 45: 

X jl(*)| 2 +|*(*)| 2 } Z \L(k)\ 2 +\R(k)\ 2 } 

m 2 (i) = — = = 5— Equation 1 

2^ \S(k)\ 2 2^ \L(k) + R(k)\ 2 

kei kei 

which can also be written as: 

1 X \L(k)\ 2 +\R(k)\ 2 } 

m 2 (i) = — j = Equation 2 

2 Z z w 1 2 + Zi R w 1 2 +2 p L * (OJZi i 2 Zi ^(*) 1 2 

with PLR(i) being the (normalized) cross-correlation of the 
waveforms of subband i, a parameter used elsewhere in parametric 
multi-channel coders and so readily available for the calculations 
of Equation 2. In any case, step 45 provides a correction factor 
m(i) for each subband i. 

[0029] The next step 47 then comprises multiplying the each 
frequency component S(k) of the sum signal with a correction filter 
C(k) : 



S'(k) = S(k)C(k) = C(k)L(k) + C(k)R(k) Equation 3 
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[0030] It will be seen from the last component of Equation 3 
that the correction filter can be applied to either the summed 
signal (S(k) alone or each input channel (L (k) , R (k) ) . As such, 
5 steps 46 and 47 can be combined when the correction factor m(i) is 
known or performed separately with the summed signal S(k) being 
used in the determination of m(i), as indicated by the hashed line 
in Figure 3 . 

| .100311 In the preferred embodiments, the correction factors m(i) 

10 are used for the center frequencies of each subband, while for 

other frequencies, the correction factors m(i) are interpolated to 
provide the correction filter C(k) for each frequency component (k) 
of a subband i. In principle, any interpolation function can be 
used, however, empirical results have shown that a simple linear 

15 interpolation scheme suffices, Figure 4. 

| [0032] Alternatively, an individual correction factor could be 
derived for each FFT bin (i.e., subband i corresponds to frequency 
component k) , in which case no interpolation is necessary. This 
method, however, may result in a jagged rather than a smooth 

2 0 | frequency - beh - av - i - a ; bi: - r -- - behav i or o f the correction factors which is 
often undesired due to resulting time-domain distortions. 
| [0033] In the preferred embodiments, the summation component 
then takes an inverse FFT of the corrected summed signal S'(k) to 
obtain a time domain signal, step 48. By applying overlap-add for 

25 successive corrected summed time domain signals, step 50, the final 
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summed signal si, s2... is created and this is fed through to be 
encoded, step S9, Figure 1. It will be seen that the summed 
segments si, s2... correspond to the segments ml, m2... in the time 

| domain and as such^. no loss of sym^hr - e - n - i -i s - a - t - i - en ----svnchr oni z a t i on 
5 occurs as a result of the summation. 

| [0034] It will be seen that where the input channel signals are 
not overlapping signals but rather continuous time signals, then 
the windowing step 42 will not be required. Similarly, if the 
encoding step S9 expects a continuous time signal rather than an 
10 overlapping signal, the overlap-add step 50 will not be required. 
Furthermore, it will be seen that the described method of 
segmentation and frequency-domain transformation can also be 
replaced by other (possibly continuous -time) f ilterbank-like 
structures. Here, the input audio signals are fed to a respective 
15 set of filters, which collectively provide an instantaneous 

frequency spectrum representation for each input audio signal. This 

| means that sequential segments can^ in fact^ correspond with single 
time samples rather than blocks of samples as in the described 
embodiments . 

20 | [0035] It will be seen from Equation 1 that there are 

circumstances where particular frequency components for the left 
and right channels may cancel out one another or, if they have a 
negative correlation, they may tend to produce very large 
correction factor values m 2 (i) for a particular band. In such 
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cases, a sign bit could be transmitted to indicate that the sum 
signal for the component S(k) is: 

S(k) = L(k)-R(k) 

with a corresponding subtraction used in equations 1 or 2 . 
[0036] Alternatively, the components for a frequency band i 
might be rotated more into phase with one another by an angle a(i) . 
The ITD analysis process S3 provides the (average) phase difference 
between (subbands of the) input signals L(k) and R(k). Assuming 
that for a certain frequency band i A . the phase difference between 
the input signals is given by a(i), the input signals L(k) and R(k) 
can be transformed to two new input signals L' (k) and R' (k) prior 
to summation according to the following: 

L'(k) = e jca(i) L(k) 
R'(k) = e~ J(l - c)a(n R(k) 

with c being a parameter which determines the distribution of phase 
alignment between the two input channels (0 • c • 1) . 
[0037] In any case, it will be seen that where^ for example^ two 
channels have a correlation of +1 for a sub-band i, then m 2 (i) will 
be Vi and so m(i) will be V2. Thus, the correction factor C(k) for 
any component in the band i will tend to preserve the original 
energy level by tending to take half of each original input signal 
for the summed signal. However, as can be seen from Equation 1, 
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where a frequency band i of a stereo signal includes spatial 
properties, the energy of the signal S(k) will tend to get smaller 
than if they were in phase, while the sum of the energies of the L, 
R signals will tend to stay large and so the correction factor will 
5 tend to be larger for those signals. As such, overall energy levels 
in the sum signal will still be preserved across the spectrum, in 
spite of frequency-dependent correlation in the input signals. 
| [0038] In a second embodiment, the extension towards multiple 
(more than two) input channels is shown, combined with possible 

10 weighting of the input channels mentioned above. The frequency- 
domain input channels are denoted by X n (k), for the k-th frequency 
component of the n-th input channel. The frequency components k of 
these input channels are grouped in frequency bands i . 
Subsequently, a correction factor m(i) is computed for subband i as 

15 follows: 

X 3w„(i)I„(i)| 2 

| [0039] In this equation, w n (k) denote frequency-dependent 
2 0 weighting factors of the input channels n (which can simply be set 
to +1 for linear summation) . From these correction factors m(i) , a 
correction filter C(k) is generated by interpolation of the 
correction factors m(i) as described in the first embodiment. Then 
the mono output channel S(k) is obtained according to: 

25 
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S(k) = C(k)Y J w n (k)X n (k) 

[0040] It will be seen that using the above equations, the 
weights of the different channels do not necessarily sum to +1, 
however, the correction filter automatically corrects for weights 
that do not sum to +1 and ensures (interpolated) energy 
preservation in each frequency band. 
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ABSTRACT- OF THE DISCLOSURE 

A method of generating a monaural signal (S) comprising 

includes a combination of at least two input audio channels (L, R) 

- is - -d - is - e - les - ed . Corresponding frequency components from respective 

frequency spectrum representations for each audio channel (L(k), 

R(k)) are summed (46? to provide a set of summed frequency 

components (S(k)) for each sequential segment. For each frequency 

band (i) of each of sequential segment, a correction factor (m(i) ) 

is calculated (45) as function of a sum of energy of the frequency 

components of the summed signal in the band (^\S(k)\ 2 ) and a sum of 

ta- 
ttle energy of sa -- i - d ----,the.„,f requency components of the input audio 

channels in the band ( ^| L(k) | 2 + 1 R(k) | 2 } ) . Each summed frequency 

kei 

component is corrected ■ { ■■ 4 - 7 -) — as a function of the correction factor 
(m(i) ) for the frequency band of saM- - the component . 

Fig-v 3 
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